Search results for "Edit distance"

showing 8 items of 8 documents

Efficient algorithm for learning simple regular expressions from noisy examples

1994

We present an efficient algorithm for finding approximate repetitions in a given sequence of characters. First, we define a class of simple regular expressions which are of star-height one and do not contain union operations, and a stochastic mutation process of a given length over a string of characters. Then, assuming that a given string of characters is obtained corrupted by the defined mutation process from some long enough word generated by a simple regular expression, we try to restore the expression. We prove that to within some reasonable accuracy it is always possible if the length of the mutation process is bounded comparing to the length of the example. We provide an algorithm by…

Discrete mathematicsRegular languageComputer scienceBounded functionString (computer science)Mutation (genetic algorithm)Edit distanceRegular expressionExpression (computer science)Time complexity
researchProduct

A novel XML document structure comparison framework based-on sub-tree commonalities and label semantics

2012

International audience; XML similarity evaluation has become a central issue in the database and information communities, its applications ranging over document clustering, version control, data integration and ranked retrieval. Various algorithms for comparing hierarchically structured data, XML documents in particular, have been proposed in the literature. Most of them make use of techniques for finding the edit distance between tree structures, XML documents being commonly modeled as Ordered Labeled Trees. Yet, a thorough investigation of current approaches led us to identify several similarity aspects, i.e., sub-tree related structural and semantic similarities, which are not sufficient…

Document Structure DescriptionComputer Networks and Communicationscomputer.internet_protocolComputer scienceEfficient XML Interchange[SCCO.COMP]Cognitive science/Computer science0102 computer and information sciences02 engineering and technologycomputer.software_genre01 natural sciencesSemantic similarityXML Schema Editor020204 information systems0202 electrical engineering electronic engineering information engineeringXML schemacomputer.programming_languageInformation retrieval[INFO.INFO-DB]Computer Science [cs]/Databases [cs.DB][INFO.INFO-WB]Computer Science [cs]/Web[INFO.INFO-MM]Computer Science [cs]/Multimedia [cs.MM]XML validationcomputer.file_formatDocument clusteringHuman-Computer InteractionXML frameworkTree (data structure)XML databaseTree structure010201 computation theory & mathematics[INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR]020201 artificial intelligence & image processingSemi-structured dataEdit distancecomputerSoftwareXMLXML CatalogData integration
researchProduct

Vector representation of non-standard spellings using dynamic time warping and a denoising autoencoder

2017

The presence of non-standard spellings in Twitter causes challenges for many natural language processing tasks. Traditional approaches mainly regard the problem as a translation, spell checking, or speech recognition problem. This paper proposes a method that represents the stochastic relationship between words and their non-standard versions in real vectors. The method uses dynamic time warping to preprocess the non-standard spellings and autoencoder to derive the vector representation. The derived vectors encode word patterns and the Euclidean distance between the vectors represents a distance in the word space that challenges the prevailing edit distance. After training the autoencoder o…

Dynamic time warpingArtificial neural networkComputer sciencebusiness.industrySpeech recognition020208 electrical & electronic engineeringPattern recognitionContext (language use)02 engineering and technology010501 environmental sciencesTranslation (geometry)01 natural sciencesAutoencoderEuclidean distance0202 electrical engineering electronic engineering information engineeringEdit distanceArtificial intelligenceHidden Markov modelbusinessWord (computer architecture)0105 earth and related environmental sciences2017 IEEE Congress on Evolutionary Computation (CEC)
researchProduct

BGSA: a bit-parallel global sequence alignment toolkit for multi-core and many-core architectures

2018

Abstract Motivation Modern bioinformatics tools for analyzing large-scale NGS datasets often need to include fast implementations of core sequence alignment algorithms in order to achieve reasonable execution times. We address this need by presenting the BGSA toolkit for optimized implementations of popular bit-parallel global pairwise alignment algorithms on modern microprocessors. Results BGSA outperforms Edlib, SeqAn and BitPAl for pairwise edit distance computations and Parasail, SeqAn and BitPAl when using more general scoring schemes for pairwise alignments of a batch of sequence reads on both standard multi-core CPUs and Xeon Phi many-core CPUs. Furthermore, banded edit distance perf…

Statistics and Probability0303 health sciencesMulti-core processorXeonComputer sciencebusiness.industry030302 biochemistry & molecular biologySequence alignmentSequence Analysis DNAParallel computingBiochemistryComputer Science Applications03 medical and health sciencesComputational MathematicsTitan (supercomputer)SoftwareComputational Theory and MathematicsEdit distancebusinessSequence AlignmentMolecular BiologyAlgorithmsSoftwareXeon Phi030304 developmental biologyBioinformatics
researchProduct

Top-k String Similarity Joins

2020

Top-k joins have been extensively studied in relational databases as ranking operations when every object has, among others, at least one ranking attribute. However, the focus has mostly been the case when the join attributes are of primitive data types (e.g., numerical values) and the join predicate is equality. In this work, we consider string objects assigned such ranking attributes or simply scores. Given two collection of string objects and a string similarity measure (e.g., the Edit distance), we introduce the top-k string similarity join () which returns k sufficiently similar pairs of objects with respect to a similarity threshold ϵ, which have the highest combined score computed by…

Theoretical computer scienceSimilarity (network science)Computer scienceString (computer science)JoinsJoin (sigma algebra)Edit distanceString metricAggregate functionRanking (information retrieval)32nd International Conference on Scientific and Statistical Database Management
researchProduct

High Locality Representations for Automated Programming

2011

We study the locality of the genotype-phenotype mapping used in grammatical evolution (GE). GE is a variant of genetic programming that can evolve complete programs in an arbitrary language using a variable-length binary string. In contrast to standard GP, which applies search operators directly to phenotypes, GE uses an additional mapping and applies search operators to binary genotypes. Therefore, there is a large semantic gap between genotypes (binary strings) and phenotypes (programs or expressions). The case study shows that the mapping used in GE has low locality leading to low performance of standard mutation operators. The study at hand is an example of how basic design principles o…

Theoretical computer sciencebusiness.industryComputer scienceLocalityParse treeGenetic programmingcomputer.software_genreComputingMethodologies_ARTIFICIALINTELLIGENCEGrammatical evolutionLocal search (optimization)Edit distanceArtificial intelligenceHeuristicsbusinesscomputerNatural language processingSemantic gap
researchProduct

Toward Approximate GML Retrieval Based on Structural and Semantic Characteristics

2010

International audience; GML is emerging as the new standard for representing geographic information in GISs on the Web, allowing the encoding of structurally and semantically rich geographic data in self describing XML-based geographic entities. In this study, we address the problem of approximate querying and ranked results for GML data and provide a method for GML query evaluation. Our method consists of two main contributions. First, we propose a tree model for representing GML queries and data collections. Then, we introduce a GML retrieval method based on the concept of tree edit distance as an efficient means for comparing semi-structured data. Our approach allows the evaluation of bo…

[ INFO.INFO-IR ] Computer Science [cs]/Information Retrieval [cs.IR]Tree edit distanceSimilarity (geometry)[INFO.INFO-WB] Computer Science [cs]/WebComputer sciencecomputer.internet_protocol[ INFO.INFO-WB ] Computer Science [cs]/Web[SCCO.COMP]Cognitive science/Computer science02 engineering and technologycomputer.software_genre[SCCO.COMP] Cognitive science/Computer science020204 information systemsEncoding (memory)0202 electrical engineering electronic engineering information engineering[INFO.INFO-DB] Computer Science [cs]/Databases [cs.DB][ INFO.INFO-MM ] Computer Science [cs]/Multimedia [cs.MM][INFO.INFO-MM] Computer Science [cs]/Multimedia [cs.MM]Information retrieval[INFO.INFO-DB]Computer Science [cs]/Databases [cs.DB]GML SearchStructural & Semantic Similarity[INFO.INFO-WB]Computer Science [cs]/WebProcess (computing)[INFO.INFO-MM]Computer Science [cs]/Multimedia [cs.MM]GISConstraint (information theory)[ INFO.INFO-DB ] Computer Science [cs]/Databases [cs.DB][ SCCO.COMP ] Cognitive science/Computer science[INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR]Ranked retrieval020201 artificial intelligence & image processingData mining[INFO.INFO-IR] Computer Science [cs]/Information Retrieval [cs.IR]computerXMLDecision tree model
researchProduct

Skeleton-Based Multiview Reconstruction

2016

International audience; The advantage of skeleton-based 3D reconstruction is to completely generate a single 3D object from well chosen views. Having numerous views is necessary for a reliable reconstruction but projections of skeletons lead to different topologies. We reconstruct 3D objects with curved medial axis (whose topology is a tree) from the perspective skeletons on an arbitrary number of calibrated acquisitions. The main contribution is to estimate the 3D skeleton, from multiple images: its topology is chosen as the closest to those of the perspective skeletons on the set of images, which means that the number of topology changes to map the 3D skeleton topology to topologies on im…

topologyreconstruction[SPI] Engineering Sciences [physics]ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION02 engineering and technologyIterative reconstructionSkeleton (category theory)Network topologyGraph-edit distanceTopology[SPI]Engineering Sciences [physics]Traitement des imagesMedial axis[ INFO.INFO-TI ] Computer Science [cs]/Image Processing0202 electrical engineering electronic engineering information engineering[ SPI ] Engineering Sciences [physics]Traitement du signal et de l'imageComputer visionSynthèse d'image et réalité virtuelleTopology (chemistry)SkeletonMathematicsComputingMethodologies_COMPUTERGRAPHICSbusiness.industry3D reconstructionPerspective (graphical)020207 software engineeringVision par ordinateur et reconnaissance de formesIntelligence artificielle[SPI.TRON] Engineering Sciences [physics]/Electronics[ SPI.TRON ] Engineering Sciences [physics]/Electronics[SPI.TRON]Engineering Sciences [physics]/Electronics[INFO.INFO-TI] Computer Science [cs]/Image Processing [eess.IV]Shock graphs[INFO.INFO-TI]Computer Science [cs]/Image Processing [eess.IV]graph-edit distance020201 artificial intelligence & image processingTopological skeletonArtificial intelligenceShapesReconstructionbusiness
researchProduct